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Abstract 

Background: There is widespread interest in measuring organizational readiness to implement evidence-based 
practices in clinical care. However, there are a number of challenges to validating organizational measures, 
including inferential bias arising from the halo effect and method bias - two threats to validity that, while well- 
documented by organizational scholars, are often ignored in health services research. We describe a protocol to 
comprehensively assess the psychometric properties of a previously developed survey, the Organizational Readiness 
to Change Assessment. 

Objectives: Our objective is to conduct a comprehensive assessment of the psychometric properties of the 
Organizational Readiness to Change Assessment incorporating methods specifically to address threats from halo 
effect and method bias. 

Methods and Design: We will conduct three sets of analyses using longitudinal, secondary data from four partner 
projects, each testing interventions to improve the implementation of an evidence-based clinical practice. Partner 
projects field the Organizational Readiness to Change Assessment at baseline (n = 208 respondents; 53 facilities), 
and prospectively assesses the degree to which the evidence-based practice is implemented. We will conduct 
predictive and concurrent validities using hierarchical linear modeling and multivariate regression, respectively. For 
predictive validity, the outcome is the change from baseline to follow-up in the use of the evidence-based 
practice. We will use intra-class correlations derived from hierarchical linear models to assess inter-rater reliability. 
Two partner projects will also field measures of job satisfaction for convergent and discriminant validity analyses, 
and will field Organizational Readiness to Change Assessment measures at follow-up for concurrent validity (n = 
158 respondents; 33 facilities). Convergent and discriminant validities will test associations between organizational 
readiness and different aspects of job satisfaction: satisfaction with leadership, which should be highly correlated 
with readiness, versus satisfaction with salary, which should be less correlated with readiness. Content validity will 
be assessed using an expert panel and modified Delphi technique. 

Discussion: We propose a comprehensive protocol for validating a survey instrument for assessing organizational 
readiness to change that specifically addresses key threats of bias related to halo effect, method bias and questions 
of construct validity that often go unexplored in research using measures of organizational constructs. 



Background 

There is widespread concern among healthcare systems 
over gaps in implementing known, evidence-based prac- 
tices in clinical care [1,2]. There may be as much as a 
15 to 20-year lag, on average, before a new evidence- 
supported practice is integrated into routine care [3]. 
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Evidence suggests that organizations have difficulty sys- 
tematically implementing new practices, and that the 
challenge often involves coordinating change among 
multiple aspects of a practice setting, rather than simply 
failing to recognize new practices as viable and desirable 
[1,4-6]. Such complex change initiatives have moderate 
to poor success rates, with published reviews reporting 
an approximate 33% median success rate, with much 
lower success for some sectors [7]. 
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Successful change efforts are characterized by many 
organizational factors, including employee and manager 
attitudes about change (to what degree it is possible and 
desirable); leadership support (making the change a 
priority); slack resources; adequate planning (clarity of 
goals and roles); and mechanisms for tracking and 
reporting progress. Some organizational scholars pro- 
pose that these factors are generally observable at the 
outset of a change initiative, and taken collectively, con- 
stitute an organization's readiness to make the change 
[8-10]. If accurately assessed, baseline organizational 
readiness could be used prognostically to predict the 
likelihood of successful change or diagnostically for for- 
mative evaluation. Many surveys have been published to 
measure organizational readiness [9,10]. However, few 
have undergone rigorous validation, notably to demon- 
strate the ability to prospectively distinguish successful 
change efforts from those that will fail [9,10]. 

In this paper, we briefly review literature on measures 
of organizational readiness for change (ORC) and discuss 
three specific threats that pose challenges for validating 
measures of organizational readiness [11-13]. Next, we 
describe our protocol for validation of a previously devel- 
oped instrument, the Organizational Readiness for 
Change Assessment (ORCA) [14], and how we address 
key threats to validity. 

Background and literature review: What we currently 
know about organizational readiness to change 

We define organizational change as planning and actions 
to alter collective behavior in the pursuit of specific objec- 
tives [15], notably the implementation of evidence-based 
clinical practice. Examples may include implementation of 
a best-practices bundle for cardiovascular disease risk 
management [16], or a collaborative care model for treat- 
ing depression in primary care [17]. Researchers frequently 
observe different levels of preparedness among organiza- 
tions adopting the same evidence-based practice [8,10]. 
This psychological, behavioral, and structural preparedness 
is what we refer to as ORC. The proximal outcome of 
ORC should be implementation effectiveness, meaning 
how effectively a clinical practice change is made [18]. 
This is different than measuring how effective the practice 
change ultimately is on care provision, which we refer to 
as innovation effectiveness [18], arguably affecting more 
distal outcomes {e.g., improving patient satisfaction, quality 
of care, efficiency or patient outcomes). 

Two recent systematic literature reviews have exam- 
ined tools for measuring ORC [9,10]. A 2008 systematic 
review found 103 published peer-reviewed papers 
addressing organizational readiness, the majority being 
empirical studies, with 53 concerning healthcare settings 
[10]. They report outcomes such as increasing levels of 
patient engagement with substance-abuse treatment 



[19]; successful implementation of varied health service 
programs by hospitals [20]; quality improvements for 
cardiac surgery programs [21]; and adoption of evi- 
dence-based treatment practices [22]. These studies 
have often reported very large effect sizes, such as an R 
of 0.47 for predicting short-term implementation of 
quality improvements for cardiac surgery programs [21], 
and an area under the receiver operator characteristic 
(ROC) curve in excess of 0.84 for distinguishing success- 
ful from unsuccessful implementation of change efforts 
reported by hospital executives [20]. 

However, this research has relied almost exclusively 
on instruments that have little or no published informa- 
tion about their psychometric properties [9,10]. Where 
validation analyses have been conducted, findings have 
often been ambiguous or methodologically flawed. For 
example, studies linking ORCA to outcomes often used 
self-reported outcomes and measured both ORC and 
outcomes after the fact [20,21], which as we explain 
below introduces bias. In the most recent review, Wei- 
ner and colleagues identified 43 unique instruments for 
measuring ORC [10]. Seven of these instruments, sum- 
marized in Table 1, were both available in the public 
domain and had undergone systematic assessment of 
psychometric properties, including scale reliability, and 
construct, content, and criterion validities [19,23-28]. 
Yet, each of the seven had further deficits that limit 
their utility as a standard measure for studying the 
determinants of organizational change [10]. 

Issues in establishing psychometric properties of ORC 
instruments 

There are a range of widely-recognized criteria for psy- 
chometric validation of survey instruments [29,30]. In 
particular, there are three psychometric tests that we 
propose are of special importance or pose unique chal- 
lenges for validating organizational construct measures: 
inter-rater agreement, predictive validation, and discri- 
minant validation. 

First, it is critical to assess the level of shared percep- 
tion in a collective phenomenon, such as organizational 
readiness. If individuals fail to share the same perception, 
then it can be argued that the phenomenon is not organi- 
zational [31]. For this reason, organizational scholars pro- 
pose four minimum criteria for aggregating individual 
survey data into collective units {e.g., teams or facilities): 
a theoretical rationale that the phenomenon is collective; 
appropriate item structure {i.e., items written in the per- 
spective of the collective as opposed to the individual); 
demonstration of adequate reliability of the scale at the 
team-level; and adequate inter-rater agreement [31]. 

Second, predictive validity is the degree to which a 
measure accurately predicts some outcome of interest 
{e.g., objective changes in behavior). While predictive 
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Table 1 ORC instruments with published psychometrics and validation issues 



inbirumifni 


Description 




i\cy 
citations 


Organizational e- 
readiness 


Measures organizational members' perceptions of readiness for 
adoption of e-commerce. 


Not suited to measuring implementation of 
general, evidence-based health service practices. 


[27,79] 


Organizationa 
readiness 


Measures organizational members' perceptions of organization's 
data warehouse process maturity. 


Not suited to measuring implementation of 
general, evidence-based health service practices. 


[28] 


Organizationa 
readiness for 
diange 


Two scales drawn from Pasmore Sociotechnical Systems 
Assessment Survey (STSAS) measuring innovativeness and 
cooperativeness. 


The STSAS, while validated, was not designed or 
validated to be a measure of ORC; authors drew 
on two subscales they believed are related to 
organizational readiness. 


[24] 


TCU 
organizational 
readiness for 
change 


Measures organizational members' perceptions of the 
motivation for change, adequacy of resources, staff attributes, 
and organizational climate. 


Extensively used, with published evidence of 
reliability and validity. However, results have 
varied, with poor scale reliability reported by 
recent studies, and inconsistent relationships 
observed between individual scales or readiness 
dimensions and outcomes. 


[19,22,80,81] 



Change-related Measures employee's agreement and willingness to work Published evidence of reliability and validity, but [23] 
commitment toward a goal of organizational change. designed for individual-level factors. Ignores the 

role of interdependence among the individuals 
involved. 



Commitment to Measures three dimensions of organizational members' Published evidence of reliability and validity, but [25] 
change commitment to a change: affective commitment, continuance designed for individual-level factors. Ignores the 

commitment, and normative commitment role of interdependence among the individuals 

involved. 



Readiness for Measures organizational members' perceptions of the Published evidence of reliability and validity, but [26] 

organizational appropriateness of change, management support, self-efficacy designed for individual-level factors. Ignores the 
change and personal benefit. role of interdependence among the individuals 

involved. 

Summarized from Weiner BJ, Amick H, Lee S-VD: Conceptualization and measurement of organizational readiness for change: A review of the literature in health 
services research and other fields. Medical Care Researcii and Review 2008, 65(4}:379-436. 



validity is generally the sine qua non of survey validation 
[15,32], research designs for predictive validation vary 
widely, and some frequently used methods may intro- 
duce threats to validity. In some studies, respondents 
retrospectively answer questions about organizational 
factors {i.e., the independent variables) and change out- 
comes {i.e., dependent variable) with the same instru- 
ment at the same point in time [20,21,33,34], potentially 
introducing common method bias. Common method 
bias encompasses a range of biases, such as recall bias 
and halo effect, that can produce spurious associations 
or grossly inflate true associations [35]. Researchers dis- 
agree about the extent to which common method var- 
iance biases results, but estimates suggest it accounts for 
18% to 26% of the observed variance in constructs mea- 
sured [36,37]. 

Finally, discriminant validity is 'the degree to which the 
measure is not similar to (diverges from) other measures 
that it theoretically should not be similar to' [35]. Discri- 
minant validity is particularly important in psychometric 
validation of organizational surveys because of bias from 
the 'halo effect,' a human tendency to infer specific attri- 
butes about a person or entity from one's overall impres- 
sions [11]. Halo effect has been shown to produce Pearson 
correlations of 0.47 to 0.91 among very disparate con- 
structs [38], and experiments have artificially induced a 



halo effect in team members' evaluation of team dynamics 
by manipulating information about their performance [39] . 

In the context of measuring ORC, our concern is that a 
halo effect could arise from knowing the outcome of the 
change, or from overall feelings toward the organization 
such as job morale or relationship quality with supervi- 
sors. In the latter case, the source of halo effect {e.g., job 
morale) may share a common cause with the perfor- 
mance outcome being measured, and therefore introduce 
confounding even for prospective criterion validation 
studies. 

The organizational readiness for change assessment 
(ORCA) 

In the funded study described in this protocol, we are 
using an ORC instrument developed by members of the 
study team, called the ORCA. The ORCA was initially 
developed by researchers in the Ischemic Heart Disease 
Quality Enhancement Research Initiative (IHD QUERI), 
part of a larger national initiative in the United States 
Department of Veterans Affairs Office of Research and 
Development. The original purpose of the ORCA was to 
assess organizational-level variables that were posited to 
influence implementation of evidence-based clinical prac- 
tice, focusing on specific practice innovations, such as 
increasing lipid measurement and management in 
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ischemic heart disease. It has been used as part of several 
evidence-based practice implementation efforts in the 
Veterans Health Administration (VA). 

The ORCA (Additional File 1) is a structured survey 
intended to assess organizational readiness to implement 
a specific, evidence-based clinical practice. It is intended 
to provide an overall indication of the likelihood of suc- 
cess at baseline, and to assess changes over time. 

Figure 1 depicts the three primary scales and 19 sub- 
scales comprising the ORCA. The survey is meant to 
be filled out by clinical and administrative staff 
involved in implementation of the evidence-based 
practice, particularly members of teams charged with 
evidence-based practice implementation. The survey is 
anchored to the specific change by an opening state- 
ment about what the practice change is expected to 
achieve, e.g., 'the ICU infection control bundle at [facil- 
ity x] will reduce nosocomial infections among ICU 
patients.' 

A detailed description of the instrument and results 
from scale reliability and factor structure analyses have 
been previously published [14], and colleagues have 



reported findings that suggest the instrument may be 
effective in predicting implementation outcomes [40]. 
However, the instrument has not been comprehensively 
validated. 

Objectives of the study protocol 

The objective of our study protocol is to conduct a 
comprehensive assessment of the psychometric proper- 
ties of the ORCA. Our primary aims are to: 

1. Extend current knowledge about the ORCA's mea- 
surement reliability, as indicated by meeting or exceed- 
ing minimum thresholds for assessing inter-rater, and 
internal consistency reliabilities. 

2. Extend current knowledge about the ORCA's con- 
tent validity, particularly within VA, using a modified 
Delphi technique with recognized VA and non-VA 
experts in organizational change, and empirically match- 
ing ORCA items and subscales to theoretical content 
domains. 

3. Assess four types of criterion validity for the ORCA: 
predictive, concurrent, convergent, and discriminant 
validities. 



Research Evidence 
Practice Experience 
Patient Needs 

Staff Discord Over Evidence 



IX 



EVIDENCE 



Leadership Culture 
Staff Culture 
Opinion Leader Culture 
Leadership Practice 
Evaluation / Accountability 
Slack Resources 



CONTEXT 



Leadership Roles In Planning 
Project Champion Roles 
Leadership Roles in Support 
Implementation Team Roles 

Assessment FACILITATION 

Evaluation 
Implementation Plan 
Communication 
Project Resources 

Figure 1 ORCA scales, subscales and outcomes. This figure illustrates the composition of the ORCA scales and their hypothesized relationship 
to organizational readiness for change, and subsequently to implementation outcomes. 
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Methods 

Data and settings 

Data will be aggregated from four intervention studies 
designed to implement evidence-based practice changes 
in clinical settings within the VA. These partner projects 
are described in detail in Additional File 2[41-71]. We 
are collaborating with each partner project to ensure the 
collection of equivalent data on important organizational 
dimensions to allow us to aggregate across samples. 
These include how implementation outcomes are mea- 
sured, and the timeframe in which ORCA and imple- 
mentation outcomes are being measured. 

In each partner project, the ORCA is administered 
prospectively to providers and staff from each VA medi- 
cal center or community-based outpatient clinic site 
participating in the implementation of the evidence- 
based practice. Each partner project determines their 
timeline for baseline-survey collection to ensure respon- 
dents are aware of the planned practice changes and can 
meaningfully participate in the survey before implemen- 
tation activities are completed. 

All four partner projects test the effects of an external 
facilitation intervention on the implementation of an 
evidence-based practice. External facilitation is a process 
of interactive problem-solving and support by indivi- 
duals or teams that are external to the organization 
implementing the innovation [71]. It uses multiple tech- 
niques and evolves in response to variable site charac- 
teristics, resources, and barriers. 

Implementation outcomes are measured between six 
and nine months following baseline administration of the 
ORCA and initiation of external facilitation. Each partner 
project determines timing of outcome and follow-up mea- 
sures to ensure adequate time for practice changes to 
occur and to provide measurement at equivalent time- 
frames across all studies. Partner projects collect outcome 
data as the proportion of users that have implemented the 
practice change, or the proportion of cases where the 
practice change occurred. This will allow us to standardize 
outcomes as an effect size and to analyze pooled data. 

Two of the partner projects are also administering the 
ORCA at their follow-up assessment six to nine months 
following baseline, and fielding additional job satisfaction 
items for convergent and discriminant validity analyses. 

The VA's Central Institutional Review Board (CIRB) 
deemed this study exempt from the standard human 
subjects ethical research requirements. 

Analyses 

To meet our objective to comprehensively assess the 
psychometric properties of the ORCA, we will conduct 
three sets of psychometric analyses corresponding to 
our three study aims: two scale and item reliability 



analyses; content validity analyses; and four criterion 
validity analyses. These are summarized in Table 2. 

We propose to conduct analyses at two levels. First, 
item-scale reliability analyses, confirmatory factor analy- 
sis (for content validation), and convergent and discri- 
minant validity analyses will use individual-level data 
from the ORCA. As explained in more detail below, the 
reliability and factor analyses are based on correlations 
among items within respondents, and on correlations 
among respondents within facilities. Second, the inter- 
rater reliability analyses, the predictive validity, and con- 
current validity analyses will be at the facility-level, 
examining differences within and between facilities on 
aggregated ORCA scales and implementation outcomes. 

ORCA scores will be tallied for each of the three 
scales at the facility level as the average of respondents' 
scores. The scores for each respondent will be tallied as 
the average of the constituent subscale scores. The aver- 
age of subscales is used instead of the average of items 
because subscales are of different lengths, and calculat- 
ing the average of the items would give relatively higher 
weight to longer subscales. ORCA scores will be treated 
as linear, continuous variables. 

Scale and Item reliability analyses (aim one) 

We will conduct two assessments of reliability. First, we 
will assess inter-rater reliability, which poses a challenge 
for organizational measures because raters do not overlap 
organizations (i.e., raters do not serve in multiple organi- 
zations and rate each one). It is possible to attribute 
variation in response to raters within an organization, 
but not to raters between organizations. This makes tra- 
ditional measures such as Cohen's or Fleiss' kappa inap- 
propriate [72]. A solution is to use an approach that 
considers the nested nature of the data (multiple raters 
within each organization). We will use hierarchical linear 
modeling (HLM), employing an empty model to sepa- 
rately estimate variance in ORCA scale scores that is due 
to the rater, versus the organization. The reliability coeffi- 
cient is calculated from the variance estimates as the 
intra-class correlation (ICC), which is the proportion of 
total variance that is attributable to disagreements among 
raters. To the extent that raters agree, then rater-level 
variation is low, and the ICC will be high. This procedure 
requires multiple raters for some observations, but can 
accommodate different numbers of raters per organiza- 
tion [72]. Inter-rater reliability will be assessed using data 
from all four partner projects. We will test for significant 
differences in mean reliability coefficients among the 
three ORCA scales from partner projects using z-tests. 
An additional level of nesting is present in the data: orga- 
nizations are nested within each of the four studies. The 
HLM approach will also examine how much of the 
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Table 2 Overview of validation analyses for primary aims 



Type of 
validation 


Definition 


Analysis 


Data Source 


Observations 


Aim 1 


Inter-rater 
reliability 


The consistency of measurement results 
across different raters given identica 
conditions 


ICC calculated from HLM to determine if 
respondents have higher agreement within 
facility and project than between. 


Individual-level, baseline 
ORCA data from partner 
projects 


n 


= 208 
= 53 


Internal 
consistency 
reliability 


The consistency of items within a given 
scale, with the same rater 


Cronbach's alpha, and item-rest correlation 
to determine if items within subscales, and 
subscales within scales, correlate more 
strongly than between subscales/scales. 


Individual-level, baseline 
ORCA data from partner 
projects 


n 


= 208 
= 53 



Aim 2 



Content 
validity 



Aim 3 



A check of the instrument's items 
against the content domain of the 
construct 



Expert panel review of conceptual domains, 
and Delphi survey on ORCA items assessing 
(a) degree of match to conceptual domain, 
and (b) importance for understanding 
organizational readiness; 



Transcripts of expert panel 
discussion and structured 
Delphi survey 



n = 14 (panel 
members) 



Confirmatory factor analysis to match items 
to subscales, and subscales to scales. 



Individual-level, baseline k = 208 
ORCA data from partner n = 53 

projects 



Predictive 
validity 


Degree to which an instrument predicts 
a theoretically meaningful outcome. 


Multivariate regression in which the ORCA 
scales serves as IV, and implementation 
effect size as the DV. 


Site-level, baseline ORCA 
data, and individual-level 
implementation outcomes 


k = 
n = 


146 
30 


Concurrent 
validity 


Degree to which an instrument 
distinguishes groups it should 
theoretically distinguish [e.g., low false 
positives and low false negatives). 


Multivariate regression in which external 
facilitation intervention is the IV and the 
ORCA scales are the DV 


Site-level, follow-up ORCA 

data, and intervention 
cohort (external facilitation 
vs. control site) 


k = 
n = 


122 
28 


Convergent 
validity 


The degree to which an instrument 
performs in a similar manner to other 
instruments that purportedly measure 
the same construct. 


Multivariate regression with ORCA scales as 
IVs, and JSI items on satisfaction with direct 
supervision and senior leadership serve as 
DVs. 


Individual-level, baseline 
ORCA and job satisfaction 
data 


k = 
n = 


158 
33 



Discriminant Degree to which an instrument performs Multivariate regression with ORCA scales as Individual-level, baseline k=158 

validity in a different manner to other IVs, and overall JSI and satisfaction with pay ORCA and job satisfaction n = 33 

instruments that measure different as DVs. data 
constructs. 



IV = independent variable, DV = dependent variable, ORCA = Organizational Readiness for chiange Assessment, JSI = job satisfaction index, HLM = Hierarchical 
Linear Modeling, k = number of individual respondents, n = number of sites. 



variation in ORCA score across sites can be attributed to 
each of the partner projects providing data. 

Second, internal-consistency rehability is the extent to 
which items from the same hypothetical scale or sub- 
scale correlate with each other as predicted. This is an 
important assessment prior to aggregating survey items 
into subscales and scales [35]. These analyses will be 
done in two stages: first focusing on the subscales and 
secondly on the scales. Internal consistency reliability 
will be assessed with two measures of item correlation 
with a given subscale: 

(1) Cronbach's alpha is a summary measure of the aver- 
age correlation among all possible combinations of items 
divided into equal pools. It provides a rough estimate of 
the cohesiveness of a set of items. We will assess the effect 
on the Cronbach's alpha of eliminating any one item from 
its given subscale to help identify specific items that con- 
tribute to poor reliability. (2) Item-rest correlation is the 
correlation of a given item to the remaining items 



collectively in its hypothesized scale or subscale, and is an 
indicator of the cohesiveness of the specific item with its 
corresponding scale. It is another method to help identify 
specific items that contribute to poor reliability [73]. Cron- 
bach's alpha is a scale-level measure of reliability, and 
item-rest correlation is an item-level measure of reliability 
[73]. For the second stage, we will calculate the Cronbach's 
alpha for the overall scales {e.g., the evidence scale) as a 
function of the constituent subscales (i.e., the aggregated 
subscale scores). Subscales or items that contribute to 
poor scale reliability may be dropped from validity ana- 
lyses, and be used to develop a shortened-form of the sur- 
vey (aim five). These analyses are based on correlations 
among items within-respondent, and thus should not be a 
function of a specific setting or organizational change [73]. 
For this reason, observations across the partner projects 
will be pooled for the internal-consistency reliability ana- 
lyses. Where a follow-up ORCA assessment is conducted 
and more than one observation exists for an individual. 
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the first observation will be used. We will adhere to pub- 
lished recommendations for handling missing data [30] . 

Content validity assessment (aim two) 

Content validity is the extent to which items in a mea- 
sure represent the content of interest within the concep- 
tual domain. Assessment of content validity can be 
accomplished through matching of item content to spe- 
cific units of a textual representation of the content 
domain and/or expert opinion that such matching exists 
and is adequate [32]. For ORCA, we propose to: trace 
each of the 77 items to their corresponding subscales) 
and report on the status of matches using confirmatory 
factor analysis (CFA); and convene an expert panel via 
conference calls to elaborate critical domains for under- 
standing ORG, and use a modified Delphi technique 
among a second group of experts to rate the adequacy 
of the ORCA's content coverage of those domains [74]. 

For the first step, we will use CFA to trace the items 
back to content domains. Weiner et al. recommend fac- 
tor analysis as an indicator of content validity for multi- 
dimensional constructs because it can be used to verify 
the existence of the theorized dimensions [10]. We will 
use CFA to assess the fit between data from the partner 
projects and the 19 subscales of the ORCA. Following 
recommendations from Joreskog and Sorbom, we will 
begin by tracing a single latent variable to its corre- 
sponding observed variables [i.e., the items comprising 
an individual subscale), then proceed to simultaneously 
test pairs of factors, and finally to testing the combina- 
tion of factors comprising each scale [75]. 

For the second step, the expert panel described earlier 
will participate in a roundtable discussion via conference 
call to discuss and identify the conceptual domains and 
dimensions critical for understanding ORC. The confer- 
ence call will be transcribed verbatim, and coded for con- 
sensus conceptual domains critical for understanding 
ORC. Summaries of the coded domains will be distributed 
via e-mail to expert panel members for comment and 
revision. 

A second, larger group of experts, which may include 
some participants from the expert panel, will participate in 
a modified Delphi process via e-mail to match and rate 
ORCA items and the expert-panel derived domains. The 
Delphi technique is an established method for 'forming 
consensus and defining levels of agreement about issues of 
uncertainty among groups of individuals who are sepa- 
rated by time and space' [76]. After reviewing the items 
and matched content, Delphi members will assign each 
item two scores: a score from 1 (lowest) to 10 (highest) 
representing the importance of the item for understanding 
ORC; and a categorical assessment of which conceptual 
domain it matches. Members will also be asked to com- 
ment on the readability and accuracy of any items they 



find problematic. The investigators will merge the results 
and provide the Delphi members the following for each 
item: their own scores previously assigned; the Delphi 
panel median scores; the panel twenty-fifth and seventy- 
fifth percentiles; and a de-identified list of comments on 
the item. Members will then use this information to repeat 
the scoring process, free to either keep their previous 
scores or change their scores, and provide additional com- 
ments if desired. Those who score an item outside the 
twenty-fifth or seventy-fifth percentile will be asked to 
provide a written reason for their score. This scoring and 
feedback cycle will be performed up to three times; if 
there are fewer than 10% changes on the second round, 
we will not repeat the process. The results will be pre- 
sented to Delphi members, and a final opportunity to 
make written comments on items will be provided. The 
final product will be an item-by-item assessment of the 
content validity of the ORCA vis-a-vis the expert panel- 
derived domains. A major advantage of the modified Del- 
phi technique is the ability to generate high-quality con- 
sensus without the need for a physical meeting. 

Criterion validity analyses (aim three) 

Predictive Validity is the extent to which the measure 
predicts a theoretically meaningful outcome [35]. Unlike 
reliability analyses, which assess correlations among 
items within respondent, or among respondents within 
the facility, the criterion analyses are at the site level. For 
ORCA, the outcome we wish to predict is the extent of 
implementation, which we term 'implementation out- 
come.' Psychometric assessment of predictive validity is 
concerned with the specific issue of establishing whether 
a relationship exists between the instrument and a rele- 
vant outcome. For example, an IQ test might be expected 
to predict subsequent school grades. 

To test the predictive validity of the ORCA, we will 
conduct HLM. The dependent variable is implementation 
outcome measured as an effect size. The partner projects 
will measure implementation outcome as a proportion of 
care practices changed, measured at the site level or at 
the provider-level and aggregated to the site level 
(described in Additional File 2), which will be trans- 
formed into an effect size based on change from baseline 
to follow-up. For example, one partner project sought to 
increase the use of cognitive behavioral therapy for 
depression; the outcome of interest is the change from 
baseline to follow-up in the percent of clinic time over 
the past 30 days that therapists report using cognitive 
behavioral therapy to treat depression [43]. We will con- 
vert change in proportions across all four projects into a 
single standardized effect size measure, Cohen's h [44]. 
Cohen's h employs an arcsine transformation of the pro- 
portion scores, which standardizes differences between 
proportions at any given magnitude of those proportions. 
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This provides a standardized outcome that can be ana- 
lyzed in aggregate. 

Independent variables will include partner project 
sample (four categories represented by three dummy 
coded variables), and whether the site received the 
external facilitation intervention as part of the partner 
project or was a comparison site (two categories repre- 
sented by one dummy coded variable). ORCA scores 
will be entered into the equation as continuous 
variables. 

We will conduct a secondary analysis to quantify the 
size of the relationship between the ORCA and imple- 
mentation outcomes. 

Concurrent validity is the extent to which the measure is 
able to distinguish between groups that should theoreti- 
cally differ [35]. In the context of the ORCA, an important 
indication of concurrent validity will be distinguishing the 
facilities in the partner projects that receive external facili- 
tation activities (intervention sites) from those receiving 
none (control sites) [71]. The external facilitation interven- 
tion, if it is effective, should alter scores on the ORCA, 
particularly the facilitation scale, over time. In the present 
study, we will assess changes in ORCA scores from base- 
line to follow-up between sites receiving external facilita- 
tion (n = 14) and control sites (n = 14). We will test the 
hypothesis that the change in ORCA scores is positive and 
larger (meaning greater readiness for change) among facil- 
itation sites relative to control sites. In the predictive valid- 
ity analyses, we expect at least 30 observations {i.e., at least 
30 sites). Data for 20 of the sites have been collected. The 
remaining sites come from one partner project currently 
in start-up at 12 sites; in calculating our power, we have 
conservatively allowed for the attrition of two of those 
sites. With 30 observations, we will have 90% power to 
detect an effect of ORCA score that is equal to or greater 
than R^ = 0.21 (with type I error rate set to 0.05, two 
tailed) [44] . We will have 80% power to detect an effect of 
ORCA score that is equal to or greater than R = 0.17 
(with type I error rate set to 0.05, two tailed). This power 
calculation conservatively estimates that the other predic- 
tors (study sample and external facilitation) will account 
for no more than 15% of the variability in implementation 
effect. 

Convergent and discriminant validities 

Convergent validity is the extent to which the measure 
converges on other measures that it theoretically should 
be similar to-most often other measures of the same or 
related constructs [35]. The challenge to assessing conver- 
gent validity is that we are interested in validating the 
ORCA precisely because systematic reviews conclude 
there is a dearth of well-validated instruments [9,10]. 
Thus, as detailed below, we chose the best measures of 
similar and dissimilar constructs possible. 



Discriminant validity is particularly salient in measuring 
multi-dimensional constructs, such as ORC (19 distinct 
subscales in the ORCA), because such constructs are 
inherently broad and complex; thus, we would expect 
them to correlate with many related organizational mea- 
surements {e.g., organizational culture). To test convergent 
and discriminant validities, we will compare ORCA scales 
to employee morale as measured by the Job Satisfaction 
Index (JSI) (Appendix B). The JSI is a validated, 12-item 
short-form [77] of the Job Descriptive Index scale which 
measures five dimensions of satisfaction with work in 
addition to overall satisfaction: the work itself, coworkers, 
management and leadership, opportunities for promotion, 
and pay [65]. The JSI has a track record of use in VHA, 
and is fielded annually in the All Employee Survey. We 
hypothesize that ORC may be related to job satisfaction; 
organizations that are better prepared to effectively imple- 
ment change may be more satisfying places to work [10]. 
However, we should observe different relationships 
between ORC and particular dimensions of job satisfac- 
tion, and these different relationships with dimensions of 
job satisfaction provide a compelling test of convergent 
and discriminant validities. For example, several of the 
ORCA subscales assess roles and characteristics of organi- 
zational leadership. Therefore, we would expect ORCA 
scores to have a strong, positive correlation (R > 0.20) to 
JSI measures of satisfaction with management and leader- 
ship. To test this hypothesis, we will build separate regres- 
sion models, with the three ORCA scales predicting JSI 
satisfaction with management and leadership. As before, 
we will have sufficient power to detect medium-sized 
(R^ = 0.15) or larger effects. 

Conversely, level of employee pay is largely prescribed 
by General Schedule pay tables for federal employees, 
occupation and tenure, and is an individual-level vari- 
able, not an organizational-level one. Therefore we 
expect little or no significant association (R < 0.10) 
between ORCA and a JSI measure of satisfaction with 
pay. If the ORCA scales, particularly context, have 
equally strong correlations with measures of satisfaction 
with leadership and pay, it suggests that respondents 
may be inferring answers to ORCA items from their 
overall feelings of satisfaction with their work. 

Overall job satisfaction will be a function of satisfaction 
with pay, leadership, and a range of other factors, such as 
the work itself and relationships with coworkers [65], 
which may be correlated with ORC, but should not be as 
strongly correlated as satisfaction with leadership, which 
are dimensions specifically measured in the ORCA. 
Therefore we hypothesize that ORC will have a signifi- 
cant but moderate relationship (R^ = 0.10 to 0.20) with 
overall job satisfaction. In sum, we expect to see the lar- 
gest relationship between ORCA scales and satisfaction 
with direct supervision and senior leadership, and the 
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smallest relationship to satisfaction with pay, with the 
relationship to overall job satisfaction falling somewhere 
in between. 

Discussion 

The proposed study will conduct a battery of psycho- 
metric validation analyses on a promising survey instru- 
ment to assess ORG. The protocol focuses on three 
psychometric practices that we argue pose particular 
challenges for validation of measures of organizational 
constructs, or are rarely completed: inter-rater agree- 
ment, predictive validation using prospective data, and 
convergent and discriminant validation. By conducting 
this research, we address a noted gap in the literature 
[9,10,13], and contribute to a stronger scientific base for 
implementation research. 

Potential limitations 

The proposed study has two limitations. The first limita- 
tion is our reliance on aggregated data from four partner 
projects. It introduces potential challenges to both analyses 
and study management. The partner projects may contri- 
bute non-equivalent data resulting from either differences 
in data collection methods or fundamental differences in 
the study samples. To mitigate this threat, we engaged 
partner projects in the earliest stages of design of the pro- 
posed study, and recruited the Pis of the partner projects 
to serve as co-investigators on the proposed validation 
study. This included multiple conversations to ensure 
familiarity with the specifics of the partner projects, 
including the ORCA administration procedures, uses of 
the ORCA data, and challenges encountered. As a result, 
we were able to ensure a level of comparability of study 
measurements and outcomes that would not be possible 
by simply aggregating secondary data. 

At the same time, capitalizing on data from multiple, 
real-work implementation projects has some advantages. 
By partnering with existing and planned implementation 
projects, the proposed study will validate the ORCA 
against real, not hypothetical implementation outcomes. 
Using prospective, real-world data increases our confi- 
dence that positive findings will not be the result of a 
spurious halo effect, and consequently that the findings 
will be applicable to those doing implementation work. 

In addition, pooling data from multiple studies likely 
produces more generalizable results owing to the diver- 
sity of the partner projects. By design, this study encom- 
passes multiple implementation projects, and avoids the 
threat that reliability and validity findings are unique to a 
specific change, set of actors, or setting, that would make 
them non-generalizable to other settings or populations. 

The second limitation is the sample size, which will be 
small relative to retrospective study designs and validation 



studies that are respondent level and not organizational 
level. A small sample poses particular challenges for criter- 
ion validation. While larger samples are, all things being 
equal, preferable, the central issue is what is necessary to 
infer criterion validity. A larger sample would be necessary 
to account for small (but statistically significant) variance 
in our proposed models. However, for the ORCA to be of 
value operationally to the VA, a large relationship is 
needed. If the ORCA fails to account for at least 15% of 
the variation in implementation (the level we set in our 
power calculations) in a relatively simple model, we argue 
that it is unlikely to be operationally useful. Accounting 
for small amounts of variance, while of interest academi- 
cally, will not be useful to decision making in how to bet- 
ter engage in the implementation of evidence-based 
programs. 

We briefly also note a methodological choice about 
the basic psychometric approach we propose. These 
analyses represent a classical test-theory approach, 
whereas much contemporary psychometric work is 
based on item response theory. We propose a classical 
test-theory approach because most applications of item 
response theory focus on unidimensional scales and 
address research goals such as identification of items 
that are subject to group biases, or creation of banks of 
items that can be used in adaptive testing. Given that 
our objective is to create a single measure comprising 
multiple dimensions, item response theory methods add 
complexity without providing an advantage over a classi- 
cal approach [78]. 

Conclusions 

In this paper, we propose a comprehensive protocol for 
validating a survey instrument for assessing ORG. This 
protocol specifically addresses key threats of bias related 
to halo effect, method bias, and questions of construct 
validity that often go unexplored in research using mea- 
sures of organizational constructs. The methods presented 
in this protocol are broadly applicable to validation of sur- 
veys to measure other organizational constructs, such as 
organizational culture, climate for safety, and team func- 
tioning. We believe this protocol can serve as a survey 
validation model for a range of organizational constructs. 

Additional material 



Additional file 1: Copy of the Organizational Readiness to Change 
Assessment instrument. This file is a PDF format of the Organizational 
Readiness to Change Assessment instrument with annotations about 
where the instrument is to be customized. 

Additional file 2: Description of four partner projects This file is a 
PDF document describing each of the four partner projects contributing 
data to the study for the described protocol, including the project aims, 
methods and details about the use of the ORCA. 
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